Efficient Implementation of Lazy Suffix Trees

نویسندگان

  • Robert Giegerich
  • Stefan Kurtz
  • Jens Stoye
چکیده

We present an efficient implementation of a write-only topdown construction for suffix trees. Our implementation is based on a new, space-efficient representation of suffix trees which requires only 12 bytes per input character in the worst case, and 8.5 bytes per input character on average for a collection of files of different type. We show how to efficiently implement the lazy evaluation of suffix trees such that a subtree is evaluated not before it is traversed for the first time. Our experiments show that for the problem of searching many exact patterns in a fixed input string, the lazy top-down construction is often faster and more space efficient than other methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A hard-disk based suffix tree implementation

Suffix trees are incredibly useful structures for computational genomics and combinatorial pattern matching. Due to the small alphabet sizes used in computational genomics, specialised hard-disk based suffix trees have been designed, but the problem of creating an efficient hard-disk based suffix tree for large and unbounded alphabet sizes remains essentially unsolved. We have designed a hard-d...

متن کامل

Efficient Implementation of Suffix Trees

We study the problem of string searching using the traditional approach of storing all unique substrings of the text in a suffix tree. The methods of path compression, level compression and data compression are combined to build a simple, compact and efficient implementation of a suffix tree. Based on a comparative discussion and extensive experiments, we argue that our new data structure is su...

متن کامل

From Nondeterministic Suffix Automaton to Lazy Suffix Tree

Given two strings, a pattern P of length m and a text T of length n over some alphabet Σ of size σ, we consider the exact string matching problem, i.e. we want to report all occurrences of P in T . The well-known Backward-Nondeterministic-DAWG-Matching (BNDM) algorithm is one of the most efficient algorithm for short to moderate length patterns. In this paper – as a prelude – we take the underl...

متن کامل

Compact Suffix Trees Resemble PATRICIA Tries: Limiting Distribution of the Depth

Suffix trees are the most frequently used data structures in algorithms on words. In this paper, we consider the depth of a compact suffix tree, also known as the PAT tree, under some simple probabilistic assumptions. For a biased memoryless source, we prove that the limiting distribution for the depth in a PAT tree is the same as the limiting distribution for the depth in a PATRICIA trie, even...

متن کامل

Computing suffix links for suffix trees and arrays

We present a new and simple algorithm to reconstruct suffix links in suffix trees and suffix arrays. The algorithm is based on observations regarding suffix tree construction algorithms. With our algorithm we bring suffix arrays even closer to the ease of use and implementation of suffix trees.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Softw., Pract. Exper.

دوره 33  شماره 

صفحات  -

تاریخ انتشار 1999